Phylogeography of Chinese cereal cyst nematodes sheds lights on their origin and dispersal

Abstract Reconstructing the dispersal routes of pathogens can help identify the key drivers of their evolution and provides a basis for disease control. The cereal cyst nematode Heterodera avenae is one of the major nematode pests on cereals that can cause 10%–90% crop yield losses worldwide. Through extensive sampling on wheat and grasses, the Chinese population of H. avenae is widely identified in virtually all wheat growing regions in China, with H1 being the predominant haplotype. The monoculture of wheat in north China might have been the key driver for the prevalence of H1 population, which should date no earlier than the Han Dynasty (202 BCE–220 CE). Molecular phylogenetic and biogeographic analyses of Chinese H. avenae suggest a Pleistocene northwest China origin and an ancestral host of grasses. We assume that the prosperity of Heterodera in this region is a result of their preference for cooler climate and various grass hosts, which only appeared after the uplift of Qinghai‐Tibetan Plateau and aridification of Inner Asia. Nematode samples from the current and historical floodplains show a significant role of the Yellow River in the distribution of Chinese H. avenae. Whereas mechanical harvesters that operate on an inter‐provincial basis suggest the importance in the transmission of this species in eastern China in recent times. This study highlights the role of environmental change, river dynamics, and anthropogenic factors in the origin and long‐distance dissemination of pathogens.


Part 3:
Analysis of genetic structure for Chinese Heterodera avenae populations based on SSR microsatellite markers

DNA extraction
The cysts were recovered from wheat root samples collected at different locations (Table   S1). Prior to DNA extraction, the identity of recovered Chinese Heterodera avenae population (CHA) were confirmed both by morphology and molecular barcoding using COI gene. The cysts were incubated at 4 °C for 6 weeks then at 16 °C to stimulate the hatching of pre-parasitic second-stage juveniles (J2s). For each population, J2s from 32 individual cysts were extracted separately following the protocol described by Adam et al. (2007), resulting a total of 416 individual cysts from 371 samples were used further analyses (Table S1).

SSR amplification and genotyping
A total of 9 pair primers were used for SSR amplification (Table S7)

Data analyses
The FSTAT 2.9.3 program (Goudet, 1995) was used to calculate the fixation index (FST) and the individual inbreeding coefficient (Fis) among and within the population. The MSAnalyzer 4.05 (Dieringer & Schlötterer, 2003) was used to calculate the genetic distance between J2 individuals based on shared alleles, and GenAlex 6.5 (Peakall & Souse, 2006) was used for PCoA analysis using the genetic distance matrix. STRUCTURE v2.3.4 was used to analyze the population structure based on the Bayesian clustering method (Pritchard et al., 2000), and the mixed ancestry model and the allele frequency association model were used to carry out 10 6 repeated Markov Monte Carlo Chain (MCMC) search, and discard the first 10 5 clustering results to determine the optimal number of clusters (K).The K value is set from1 to 13, and each K value is run 20 times to calculate the LnP(D) corresponding to each K value, and then calculate the ΔK value through LnP(D) to obtain the difference between the K value and ΔK (Evanno et al., 2005). The 20 replicates were aligned and integrated using CLUMPP v1. 1.2 software (Jakobsson & Rosenberg, 2007), and finally plotted using DISTRUCT (Rosenberg, 2004 Wilcoxon signed-rank test method in BOTTLENECK v1.2.02 (Cornuet & Luikart, 1996) was used to test whether each population had experienced bottleneck effect, based on infinite allele model (IAM) and two-phase model (TPM) and stepwise mutation model (SMM), and three models were tested separately (Cristescu et al., 2010). In order to clarify whether geographic isolation aggravates the graminearum cyst nematode Genetic differentiation of populations. The Mantel test implied R package "ade4" was used to test the fixation index FST among populations based on microsatellite data and the average geographic distance of the sampling area.

Genetic diversity of the CHA population
In general, we recovered a low genetic diversity among studied population (Table S8)  Sample size (N), average number of alleles per locus (Na), the effective number of alleles (Ne), the observed heterozygosity (Ho), the expected heterozygosity (He), unbiased expected heterozygosity (uHe), deviation from HWE for heterozygosity deficit and estimator of the fixation index (Fis); ns, not significant, * P < 0.05, ** P < 0.01, *** P < 0.001 Population structure of CHA Genetic structure of studied population suggested a moderate to high FST value, with the average fixation index as 0.162, and the 95% confidence interval is between 0.094 and 0.224 (Table S9). The smallest fixation index (FST = 0.004) was found between Beijing (BJ) and Tianjin (TJ) while the largest (FST = 0.435) was between Qinghai (QH) and Ningxia (NX) population. Mantel test was performed using FST as the genetic distance against their geographic distance. The results suggested that there was a significant positive correlation between genetic distance and geographic distance (r = 0.284, P = 0.038), and the CHA population followed the isolation-by-distance pattern (IBD) pattern (Fig. S2). The PcoA (Fig. S3) showed that the CHA populations were clustered into three groups, i.e.: Beijing-Tianjin region, central region and northwest region. With the coordinate 1 explains 52.78% and coordinate 2 explains 15.29% of variance, two axes explain a total of 68.07% variance. We further constructed the NJ phylogenetic tree based on the Cavalli-Sforza & Edwards genetic distance and resulted topology was generally consistent with the clustering in principal coordinate analysis (Fig. S4) In Bayesian cluster analysis based on STRUCTURE, the highest △K value was achieved when K = 2 (Figure 5a), but △K remain relatively high when K = 3. Only after the LnP(K) value greater than 3 both △K and LnP(K) tends to be stable (Fig. 5a,4b). When K = 2, the studied populations can be clustered into two groups, namely the Beijing-Tianjin-Northwest (green, Group I) and the central region (red, Group II), and the gene mixing is more obvious. When K = 3, the populations were clustered into three groups, namely the 8 Beijing-Tianjin region (Clade I), northwest region (Clade II) and central region (Clade III). All groups within Clade II and Clade II groups had mixed genotypes (Fig. S6).

The analysis of molecular variation model
The results of AMOVA (Table S10) showed that 83.77% of the variation came within the population (intra-population) in the case of no grouping. When the population was clustered into two groups (K = 2), the intra-population variation rate accounted for 78.03%, the inter-groups variation rate was 12.72%, and the inter-populations variation rate was 9.25%. When the population was clustered into 3 groups (K = 3), the population variation rate was similar to that of the two groups, the intra-population variation rate accounted for 78.09%, and the inter-group variation rate was 13.91%, and the variation rate between populations was 7.99%.

Detection of possible bottleneck effect
The BOTTLENECK software was used to detect the bottleneck effect was calculated based on the three models of IAM, TPM and SMM. The data are shown in Table S11.